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To approximate functions of a single variable by using linear 
interpolation is routine in empirical studies. Here, we consider ap- 
proximating functions of several variables in a similar piecewise 
linear manner. We focus on the nontrivial part of this technique, 
which is that of choosing the appropriate "pieces" for the piecewise 
linear approximation. Precisely, we seek to identify the best interpo- 
lants to use at a point of interpolation. This is not an issue for 
functions of a single variable, since the linear ordering of the number 
line leaves us with no choice. For functions of several variables, we 
propose several simple tools to help uncover undesirable choices. The 
techniques presented are useful in the empirical study of quantita- 
tively complex functional relationships whose qualitative behavior is 
nevertheless known and simple. Response time relationships para- 
metrized by workloads in computer performance modeling often fall 
into this category, and an actual bivariate function of this type is 
used to motivate the development. 

I. INTRODUCTION 

Linear approximation is a popular and economical way to gain 
appreciation of the behavior of functional relationships. Especially in 
higher dimensions, making sense out of a sample of data points in 
terms of the underlying multivariate relationship is greatly facilitated 
by some form of piecewise linear approximation. Such an approxima- 
tion allows for estimation of the function at values not included in the 
sample, sheds light on the activity of the function at selected neigh- 
borhoods, and identifies regions where the relationship behaves inter- 
estingly. 

For the sake of concreteness let us consider a typical problem. A 
simulation model of a computer system has been given. The workload 
driving the system is described by two variables giving the respective 
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percentages of two classes of users in the user population; there are 
three classes of users altogether. The two variables, then, form a 
bivariate parametrization of the workload. Six simulations were run, 
and the resulting mean response times are shown below. (This response 
time function will be denoted by h throughout this paper.) 

h (60, 25) = 5.5 

A (60, 7) =2.1 
h (40, 30) = 1.2 

h (40, 7) = 1.0 

A(20,25) = 0.7 

h (20, 15) = 0.7 

It will be necessary to estimate the response times for many more 
workloads than for the six simulations already run. Yet, it would be 
inefficient to make a run for each possible parameter pair (xi, X2). It 
would also not be necessary, given the qualitatively simple relationship 
that generally exists between workloads and response times of com- 
puter systems. In this particular case, the function is expected to be 
monotone. Therefore, the six simulations not only give us the values 
of the function at those six points, but also the values between and 
around them, that is, at least approximately. For example, we could 
safely assert that h (55, 10) should he between 1.0 and 5.5. In fact, it 
would be reasonable to estimate the range to be from 1.8 to 5.5. This 
becomes obvious by plotting the six-parameter pairs used in the 
simulations, plus the point P = (55, 10) on the (x u x 2 ) plane, as shown 
in Fig. 1. The points A through F are labeled in the order in which 
they were collected; point A is the oldest. Since the polygon with 
corners A, D, B, and E bound the point (55, 10), it is reasonable to 
deduce that the values of h at A, D, B, and E bound the value of h at 
(55, 10). This gives the interval (1.0, 5.5). Furthermore, h(55, 7) may 
be estimated as 1.8 by linear interpolation between h (40, 7) = 1.0 and 
h (60, 7) = 2.1. Since h is increasing in both X\ and X2, h(55, 10) =■ 
h(55, 7). This gives the sharper interval (1.8, 5.5). 

As we will see, this approach can be continued until a numerical 
estimate for h (55, 10) is reached. Two steps are required. The first 
step identifies the data points that contain relevant information about 
7i(55, 10). For example, we have already rejected C and F as irrelevant 
to P. This was simple and was done "by eye." Finer methods need be 
developed, however, to determine which one of A, B, D or E should be 
further discarded. The major part of this paper, starting with Section 
III, deals with this and related problems. The second step consists of 
fitting a linear function over the data points chosen as a result of the 
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Fig. 1 — Data points for case study. 

first step; the mechanics of linear fitting will be reviewed in Section II. 
Together, the two steps produce the "best" way to fit the data points 
in a piecewise linear way. The concluding recapitulation gives an 
overview of the methodology in an iterative context. 

Formulas used in our treatment are gathered in a sequence of 
propositions. Since they are rather straightforward and our interest is 
in their usage, only abbreviated arguments for their validity are 
included. All calculations may be easily carried out on a computer. 
Needed, aside from basic arithmetic, are routines to define and manip- 
ulate matrices, perform matrix multiplications and inversions, take 
determinants, and find eigenvectors of symmetric matrices. Compu- 
tations for our examples have been carried out with relatively short 
programs in the S statistical package. 1 

Our proposed approach is not meant to replace the standard practice 
of fitting a single, global functional relationship to the data. The two 
procedures reveal different aspects of the data. Function fitting, being 
inherently global, is well suited for capturing the overall behavior of 
the relationship. This we hope to complement with the piecewise 
linear approach, which, being inherently local, is better suited for 
pinpointing places where interesting things happen to the function. 
This is especially relevant for iterative empirical studies, where it is 
useful to know where the function is active in the parameter space so 
that further experiments can be fruitfully specified. 

II. THE BASIC INTERPOLATION 

Linear interpolation for functions of a single variable is usually 
taught in high school in terms of such notions as similar triangles and 
slopes. For the purpose of generalization to higher dimensions, an 
alternative, though algebraically equivalent, viewpoint is preferred. 
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Specifically, let f(x\) = y\, f(x2) = y 2 , and x\ < x < x 2 . It can be shown 
that the usual linear interpolation gives the estimate 

a\y\ + a 2 y2, 

where ai and a 2 satisfy 

OiXi + CL 2 X 2 = x, 
cti + a 2 = 1, 

tti, flfe> 0. 

In other words, to estimate f(x) for some x in the interval Xi < x 2 , 
express a: as a convex combination (weighted average) of X\ and x 2 , 
then estimate f(x) as the same convex combination of f(xi) and f(x 2 ). 
Notations: As usual, R" is the set of n-dimensional real column vectors. 
Also standard is the use of the prime notation for matrix transposition, 
as in M'. If the y'th column of matrix M is m,, we write M = 
(mi, • • • , m*). The length of a vector v = (vi, • • • , v n )' is denoted by 

||v|| = V(!;? +...+!/»). 

For any x G R n , let x* E R n+1 denote 

x 

1 

i.e., the vector consisting of x followed by the singleton 1. If / is a 
function, /denotes its approximation by linear interpolation. 

A higher dimensional analogue of having the points Xi , x 2 generate 
an interval in one dimension is having points Xi, X2, X3 generate a 
triangle in two dimensions and having xi , x ; > , x :l , x. t generate a pyramid 
in three dimensions. Just as we require that an interval in one dimen- 
sion should not shrink to a point, we require that a triangle 
should contain area on a plane rather than reduce to a line segment, 
and that a pyramid should contain volume in solid space rather than 
collapse onto some plane. In general, we are concerned with the convex 
hull of (n + 1) points x x , ■ • • , x„ + i e R", subject to the condition that 
these points do not lie in some lower (<n) dimensional hyperplane. 
The matrix formulation of this geometric requirement is that the linear 
transformation X given by 

X = (Xi — Xn+l, • • • , X n — X n +l) 

should have the trivial kernel, or, by the rank-nullity theorem, satisfy 
det(X) ^ 0. 

Definition: An n-dimensional pyramid is the convex hull of a set of 
(n + 1) vectors {xi, • • • , x„+i} C R n such that 

det(Xi - Xn+l, • • ■ , Xn — x„+i) ^ 0. 
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The points x, are called its vertices, and any subset of n vertices forms 
a face of the pyramid. 

We will often call a set of points a pyramid, rather than a convex 
hull generated by these points, as would be strictly correct; however, 
this makes no difference since there is a one-to-one correspondence 
between the set of vertices and the pyramid it generates. 
Proposition 1: Let /: R" -> R be a function of n variables. Let 
{xi, ••• , x„+i} C R" be (i.e., generate) an n-dimensional pyramid 
containing x£R", and let /"(x,) = yj. Then, 

(i) X* = (xf, • • • , x* + i) is an invertible (n + 1) X (n + 1) matrix, 
and 

(ii) the estimate /(x) = (y u • • • , ^„ + i)(X*) _1 x* is a higher dimen- 
sional generalization of linear interpolation for a single variable. 

Proof: 

(i) If X* were not invertible, its columns would be linearly depend- 
ent. Therefore, one of the vertices would be a convex (not necessarily 
positive) combination of the n remaining vertices, implying that the n 
+ 1 points lie in (n — 1) -dimensional hyperplane at most. This contra- 
dicts the definition of a pyramid. 

(ii) Let a = (X*) _1 x* = («,,-••, a n+ i)'. Since x* = X*a, it follows 
from the star (*) definition that 

x = «iXi + • • • + a n +iX„+i, 

Ol + • • • + Otn+l = 1 • 

Thus, a expresses x as a convex combination of {xi, • • • , x„+i}. Since 
x is contained in the convex hull, a ^ 0. Hence, the linear interpolation 
at x should be 

/(X) = OCiyi + ... + Oln+iyn+U 

which is as proposed. 

In practice, we will not know whether x is contained in the convex 
hull. Therefore, the vector of coefficients a should be explicitly com- 
puted to check that a ^ 0. The use of the procedure with negative 
components in a corresponds to linear extrapolation. 
Example: Returning to the computer system response time function 
h, we see from the (x lt x 2 ) plot in the previous section that 

«™-{(s). (")■ (")} 

forms a two-dimensional pyramid, or triangle, containing 

'55^ 



P ='10 
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Alternatively, if plots are not feasible, as would be the case in higher 
dimensions, we calculate 

, /40-60 40-60\ dan , n 

checking that a noncollapsing pyramid is obtained, and 
/40 40 60\ _1 /55\ /0.13\ 



checking that 



is contained in the pyramid. Interpolation according to the formula 
gives 

/0.13\ 
(1.2 1.0 2.1) 0.12 = 1.85. 



,0.75, 



Similarly, 



«4. «-{(-). (1 



40\ /60\ /60 
25 



is a pyramid containing 




and interpolation using this pyramid gives 

/40 60 60\ _1 
(1.0 2.1 5.5) 7 7 25 J 



III. THE SELECTION PROBLEM 

We have just seen how two different choices of pyramids can lead to 
two markedly different approximations. Taking the average and ap- 
proximating 

^10 

by 16(1.85 + 2.39) = 2.12 is not justified. An average is appropriate 
when summarizing a batch of numbers that have been made different 
by random error, since the averaging process "zeroes out" the random 
errors. In our case, the difference between the two approximations 
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does not arise from random error. One of the choices is better than the 
other, and that is not a probabilistic phenomenon. 

Another way to proceed is to put greater trust on closer interpolants 
than on further ones and choose the pyramid yielding the least- 
summed distance from its vertices to that point. Under this criterion, 
the pyramid ADE should be preferred over ABE, since D is closer to 
P than B is by nine units. Unfortunately, the notion of distance is not 
invariant to the choice of scales on the axes, and the criterion becomes 
sensitive to changes of units in the independent variables. The dis- 
tance-type criteria have a further, more fundamental weakness, which 
they share with, among others, the criterion of Lawson. 2 

As used by Akima 3 in his algorithm for bivariate smooth interpola- 
tion, Lawson's criterion triangulates the (x u x 2 ) plane in such a way 
that minimum interior angles of triangles are maximized. The objective 
is to set up as many "fat" triangles as possible. Under such "fatness- 
type" criterion, ADE is again chosen over ABE, since ABE comes with 
the companion triangle ABD, which is too "skinny." An extremely 
skinny triangle is undesirable for two reasons. First, its X* may be 
numerically unstable to invert. Second, it biases the estimation along 
a particular direction in the (xi , x 2 ) -plane. (On the other hand, we will 
see in Section VI that thoughtful biasing may be beneficial.) It is not 
necessary, however, to make fatness an overall criterion just to avoid 
such excesses. 

The problem with these distance- and fatness-type criteria is that 
only the configuration of the x variables is used in assessing pyramids. 
(With the curve-fitting approach that uses Lagrangian and trigono- 
metric polynomials, 4 even this information is not taken into account.) 
This is an unnecessary restriction. Surely the values of the function at 
the interpolants, the /(x)'s, have much to say on the adequacy of an 
interpolation. Our basic working tool to tie x together with /"(x) is the 
notion of steepest ascent. Using it, we show that the correct pyramid 
to choose is ABE, rather than the pyramid ADE, favored by both 
criteria above. Incidentally, the ABE choice for (55, 10) was confirmed 
empirically by an actual simulation run which gave the result 
h(55, 10) = 1.74. Recall that interpolation with ABE gives 1.85, 
representing a 6.3 percent error, while interpolation with ADE gives 
2.39, representing a 37.4 percent error. The fact that the interpolation 
overestimates could also have been predicted. See Section VII, where 
assumptions underlying our approach are discussed. 

IV. STEEPEST ASCENT 

Equipped with the mechanics of basic interpolation, we may analyze 
in detail the function's interpolated behavior on a given pyramid. We 
would like to know along which direction inside the pyramid the 
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function increases, and how fast that increase is. Because of the 
linearity of the interpolation, both questions have well-defined an- 
swers. The geometric way to derive an answer is to slice the pyramid 
into parallel "contour hyperplanes," which are systems of hyperplanes 
such that points on the same hyperplane have the same interpolated 
values. 

Example: Figure 2 shows some contour hyperplanes (contour lines in 
two dimensions) corresponding to the pyramid (triangle) ABE. Points 
along the downward-sloping lines take the same interpolated values as 
labeled. Clearly, h increases most rapidly along the direction of the 
arrow, which is perpendicular to the contour lines. 

Since the contour hyperplanes are parallel, a single vector perpen- 
dicular to all of them can be found. Actually, there will be many such 
vectors, but they can only have one of two opposite directions, one for 
increasing f and another for decreasing /. We agree to take the 
increasing direction. Being perpendicular to the contour hyperplanes, 
this direction has no component along which f does not increase. 
Therefore, it is the direction along which / increases the fastest. We 
define a unit vector in this direction to be the direction of steepest 
ascent of the function / in the pyramid. The rate of steepest ascent is 
the increase off along the direction of steepest ascent per unit distance 
traveled. 

It remains to formulate these geometric notions in matrix terms. 
Proposition 2: Let /: R n —* R be a function of n variables, {xi, • • • , 
x„+i } C R" be a pyramid, /"(x,-) = j,, and X* = (xf, • • • , x*+i ). Let 

(d, ••• , a„+i) = (y u ••• ,y n +i) (X*)" 1 . 

(i) The direction of steepest ascent in {xi, • • • , x n+] } is (a it • • • , 
a n )/\\(a u ••• , a n )\\, and 

(ii). The rate of steepest ascent in {xi, • • • , x„+i} is ||(ai, • • • , a n )\\. 




1.4 1.6 1.8 A 



Fig. 2 — Contour lines for ABE. Function increases in direction of arrow. 
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Proof: By definition of (a,, •••, a n +\), the interpolation fits the 
function 

f(xu • • • , x n ) = aiX\ + • • • + a n x n + a„+i 

on the pyramid {xi , • • • , x„+i }. 

To show (i), note that/(u) = f(v) implies that (a u • • • , a„) (u - v) 

= 0. To show («'), note that /((a,, ••• , a„)/\\(a u • •• , a n )\\) -/(0) = 

||(ai, •••,a„)||. 

Example: Table I lists some directions and rates of steepest ascent 

that we will use later. 

V. A FIRST APPROACH: JUDGING BY FACES 

Example: We are now ready to reject ADE in favor of ABE. Actually, 
since ADE and BDE form a companion pair, we will reject them both. 
This will be accomplished by analyzing the face DE. In Fig. 3, ADE 
and BDE are shown along with their respective directions of steepest 
ascent loosely placed about their centers. The two directions of steep- 
est ascent make sense separately. Both show response time as an 
increasing function of x, and x 2 . Together, however, they show that 
DE is a ridge, i.e., h increases as DE is approached from either side. 



Table I — Some directions and rates 
of steepest ascent 



Pyramid 



Rate of 

Direction of Steepest 

Steepest Ascent Ascent 



ADE 


(0.29, 0.96)' 


0.197 


BDE 


(0.99, 0.05)' 


0.218 


ABD 


(0.81, 0.58)' 


0.323 


ABE 


(0.98, 0.19)' 


0.059 




Fig. 3— ADE and BDE with their directions of steepest ascent. 
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Alternatively, a ridge indicates that h is not monotone along some 
direction on the (x\, Xz) plane. From our knowledge of h, no such ridge 
should exist. The triangulation {ADE, BDE } incorrectly shows a ridge 
because h(D) is inordinately large. The triangulation allows D to pull 
up both triangles, causing their boundary to buckle. The correct way 
to picture h is shown in Fig. 4. For low values of X\ and X2, h grows 
moderately (rate of steepest ascent = 0.059 on ABE), with Xi being 
mainly responsible for the increase. As x\ and x 2 become large, 
h explodes (rate of steepest ascent = 0.323 on ABD), and x 2 begins to 
affect h seriously, although X\ is still the major contributor. Because 
of the piecewise linearity of interpolation, AB has become an acceler- 
ating, refracting boundary but not a ridge. In Section VI, we will see 
another reason why ADE is a poor choice. 

Note that specific knowledge about h was invoked to make the 
selection: We knew that response time as a function of workload had 
no ridges. In general, linear interpolation of functions with known 
major ridges, valleys, maxima, or minima should be avoided. An 
exception may be made if we are able to sprinkle such tricky terrain 
generously with further interpolants. Under finer resolution, the vol- 
atile formations should diminish. Of course, even if a ridge does exist, 
it is highly improbable that it should coincide with one of our faces. 
Definition: A face is a bad face if there are two pyramids containing it, 
and their directions of steepest ascent either both point towards it or 
both point away from it. 

Thus, we accept as likely those faces that may refract and/or 
accelerate steepest ascent vectors, but we question faces that gather 
or scatter them. 

Matrix techniques are clearly needed to identify bad faces in higher 
dimensions, where graphs are not feasible. It is enough to find a 
technique to determine if a given steepest ascent approaches a given 




Fig. 4 — ABE and ABD with their directions of steepest ascent. 
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face. Since approaching a face at 90 degrees is different from approach- 
ing it at 0.0001 degree, the angle of approach is also of interest. Note 
that knowledge of the angle alone does not tell us whether the ascent 
approaches; the ascent may be located on either side of the face. 
Proposition 3: Let a e R n be a steepest ascent for the pyramid 
{xi, • • • , x n+ i} C R". Let {xi, • • • , x„} = F be the face of interest. 
Define the n X (n - 1) matrix 

X = (Xi — Xn, X2 — Xn, ■ ■ * , X„-i — X„) . 

Also define the n X n matrix 

Y = (-X, a) = (x„ - xi, x„ - x 2 , • • • , x„ - x„_i, a). 

(i) Let = (ft, • • • ,pnY = y _1 (x„ - x„ + i). If /?„ >0, a approaches 
F. If /?„ < 0, a recede s from F. If Y is not invertible, a is parallel to F. 
{ii) Let cos = Va'X(X'X) _1 X'a. Then is the angle between a 
andF. 

Proof: 

(i) The trick here is to pull the vertex x„ +i to the origin so that we 
may examine the relationship between a, now sitting at the origin, and 
F, now translated to {x! - x n+l , • • • , x„ - x„+, } . For as given above, 
let in e = Pi far i - 1, • • • , n - 1, 7r„ = 1 - (tt-i + • • ■ + ir n -i ), and t = 
/?„. The reader may check that 

ta. = Y, vtfc — x„+i), 



i=i 



1-1 
The interpretation of the sign of fi n follows from the trick just de- 
scribed. 

(ii) The trick is to translate the face to the origin by x„. We then 
regress a against the translated face {xi - x„, • • • , x„-i - x„ } so that 
its projection proj(a) onto the linear subspace spanned by the trans- 
lated face may be found by the usual formula. Now take the inner 
product between a and proj(a), bearing in mind that a is a unit vector. 
Example: Using the formulas above, we find the DE is approached by 
the direction of steepest ascent of BDE at 31 degrees, and DE is also 
approached by the direction of steepest ascent of ADE at 39 degrees. 
Thus, DE is a ridge. 

VI. ANOTHER APPROACH: THE AXES CRITERION 

Example: Let's consider again the selection problem with the response 
time function y = h (x i , x 2 ), but this time we consider the complemen- 
tary pair of triangles ABD and BDE, shown in Fig. 5. While neither 
triangle is very skinny, we could roughly identify directions along 
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which they stretch out. For example, ADB stretches out in some 
direction along AB, which is quite perpendicular to its direction of 
steepest ascent. BDE, on the other hand, stretches somewhat along 
DE, which is not at all perpendicular to its direction of steepest ascent. 
This difference affords another basis for selection. Let's exaggerate 
somewhat and give the two triangles more definitive and mutually 
perpendicular directions of stretch; we also assume that the "direction 
of steepest ascent of the function" (a vague notion), denoted by a, is 
perpendicular to the stretch of ABD. The situation is shown in Fig. 6. 
We may now rotate the configuration to new axes (x[ t x'2) so that the 
pair of triangles stretch in directions parallel to the new axes, as in Fig. 
7. Because of our assumption, a becomes parallel to x\. This means 
that x'2 has no effect on y, so we may plot y as a function of x\ alone. 




a, = ASCENT IN ABD 
a 2 = ASCENT IN BDE 

Fig. 5 — Triangles ABD and BDE with directions of steepest ascent. 




Fig. 6 — ABD and BDE stretched in mutually perpendicular directions. 
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Since a points along x[ , y is an increasing function of x[ . If y is a linear 
function of xi , it will also be a linear function of (x i , x 2 ), in which case 
it doesn't matter which triangle we select. Both will give perfect fit. 
However, for h, we must protect ourselves against nonlinearity. We 
actually know that h is concave downwards as a function of xi. (If h 
is concave upwards, the same argument below holds.) The elimination 
of x' 2 leads to the (xi, y) plot given in Fig. 8. Clearly, ABD hugs the 
function much better than BDE. By stretching linearly far along the 
direction of ascent, BDE loses touch of the underlying, nonlinear 
function. Here is the second reason why ABE is a better choice than 
ADE. 

The moral is that triangles perpendicular to their directions of 
steepest ascent give better estimates. The same is true in higher 
dimensions, although in higher dimensions it is not appropriate to talk 
about the direction of stretch of a pyramid because a pyramid may 
stretch in several directions at once. In three dimensions, for example, 



2 I 




Fig. 7— Rotation of Fig. 6. 




V =h(x- y ) 



Fig. 8 — Elimination of x 2 . 
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a solid pyramid may stretch out in one single direction, as in a needle- 
like pyramid, or in two directions, as in a pancake-like pyramid. 
Therefore, we will appeal to the notion of the axes of a pyramid — the 
system of coordinates that best aligns with the vertices of the pyramid, 
such that the first coordinate aligns with the furthest stretch of the 
pyramid, ■ • • , the last coordinate aligns with the shortest stretch of 
the pyramid. The coordinates {x\, x'2) found above by rotation are an 
informal, visual example. Formally, it is necessary to define "best 
alignment." For convenience, we will define it in terms of least-squared 
distances. Then, we may use the principal component technique from 
multivariate statistical analysis. 5 
Definition: Let {xi , • • • , x„+i} C R" be a pyramid with mean vector 

X = (Xi + • ■ • + Xn+i ) • 

n + 1 
Define the nX (n + 1) matrices 

X = (xi, •• • , x*+i), X = (x, •••,x). 

Define the n X n covariance matrix 

V = -(X-X)(X-X)'. 
n 

Let Vi , • • ■ , v„ be the eigenvectors of V with eigenvalues Xi ^ • • • ^ 
X n . (That is, Vv, = A,v,.) Then Vi is the first axis of the pyramid, V2 is 
the second axis of the pyramid, • • • , v„ is the last axis. 

The eigenvalue A, tells us how far the pyramid stretches along v,. 
Thus, a four-dimensional pyramid with eigenvalues 1000 ^ 10 ^ 10 ^ 
10 stretches out mainly along Vi, while a four-dimensional pyramid 
with eigenvalues 1000 ^ 500 ^ 10 ^ 10 stretches mainly along Vi and 
V2, with the stretch along Vi being significantly larger than the stretch 
along V2 . A pyramid with identical eigenvalues does not stretch in any 
direction and is "fat." The eigenvalues cannot be for pyramids. 

The use of the axes Vi , • • • , v„ in evaluating a pyramid is as follows: 
if a is its direction of steepest ascent, then the ideal pyramid 
{xi , ■ • • , x„+i } satisfies 

a'vi = 



a'v„-i = 0. 

Thus, its first (n — 1) axes are perpendicular to the direction of steepest 
ascent, leaving the last axis, the direction of shortest stretch of the 
pyramid, to coincide with its direction of steepest ascent. When com- 
paring less-than-ideal pyramids, we choose the one where | a'v, | is close 
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to for as many of the first Vi, • • • , v m as possible. Note that a'v, is 
the cosine of the angle between a and v,. 

Example: Using the formulas given above, Table II has been com- 
pleted. Note that Vi and v 2 are mutually perpendicular, as they must 
be. From the relatively high ratios of A1/A2, it follows that all four 
triangles have fairly prominent major (first) axes, i.e., they stretch 
along some distinct direction. ABD and ABE form the narrower pair 
of triangles. By comparing | a'vi | , it follows that ADE and BDE tend 
to stretch along their respective directions of steepest ascent, while 
ABD and ABE are more perpendicular to their respective steepest 
ascents. Therefore, we chose the triangulation {ABD, ABE) over the 
triangulation {ADE, BDE). 

VII. ASSUMPTIONS 

Now we spell out some assumptions about the underlying function 
that are relevant to the validity of our linear interpolation. 

The foremost assumption is that the function is reasonably smooth 
with respect to the mesh of the interpolants. Without such basic 
optimism, which is buttressed by the prerogative of adding further 
interpolants to refine the mesh, systematic investigation could not 
proceed. As described in Ref. 6: "The experimenter is like a person 
attempting to map the depth of the sea by making soundings at a 
limited number of places. • • • mapping a surface resembling a nest of 
stalagmites or the back of a porcupine would be impossible • • • since 
characteristics of the surface at one point would not be related to 
characteristics elsewhere." Under the smoothness assumption, the axis 
analysis presented in Section VI is valid, since it simply recommends, 
in quantitative terms, the prudence of spacing out the interpolants 
only where the function appears to behave uneventfully. 

The face criterion of Section V applies when the function is mono- 
tone, in addition to being smooth. (Monotonicity as defined here is 
more stringent than what is usually required.) 

Definition: A multivariate function /": R n -* R is monotone if one of 
the following holds for any x,y£ R n : 

(i) For all a, fi e (0, 1), a > fi -* /(ax + (1 - a)y) ^ /"(/?x + 
(1 - P)y), or 

Table II — Axes analysis for typical problem 



Trian- Steepest As- 
gle cent a 



Eigenvectors vi, vs 



Eigenvalues 

Xi, A2 a^i 



ADE (0.29, 0.96)' (-0.78, -0.63)', ( 0.63, -0.78)' 

BDE (0.99, 0.05)' (-0.65, -0.76)', ( 0.76, -0.65)' 

ABD (0.81,0.58)' ( 0.68, -0.73)', ( 0.73, 0.68)' 

ABE (0.98, 0.19)' (-0.60, 0.80)', (-0.80, -0.60)' 



182.0, 59.3 -0.83 

183.7, 96.0 -0.68 

233.4, 46.3 0.13 

234.5, 75.2 -0.44 
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(ii) For all a, /? E (0, 1), a > (S -+ /(ax + (1 - a)y) % /(/?x + 

(l-^)y). 

That is, / either increases or decreases consistently along whichever 
direction one travels in the x-space. 

Often, specific knowledge of the function includes its monotonicity. 
With our h, for example, it is unlikely that any direction should be 
singled out along which h "rollercoasts." 

A further useful assumption is convexity (concavity). 
Definition: A function /: R n — > R is convex on a convex region AcR" 
if for any x,y6i and a E. (0, 1), 

/(ax + (1 - o)y) Si a/(x) 4- (1 - a)/(y). 

The function / is concave if 

f(ax + (1 - a)j) i= af(x) + (1 - a)f(y). 

If /"is convex on the region A, the interpolated function/ satisfies 

/(x)^/(x)(xEA). 

(Similarly, if f is concave, we have f ^ f.) Hence, knowledge of the 
convexity (concavity) of /permits the conclusion that the interpolated 
approximations are the upper (lower) bounds to the true values of/. 
Since h appears to be convex on the hexagon ABCDEF, the response 
time estimates obtained from linear interpolation are likely to be 
worst-case estimates. This agrees with actual data: 

h (55, 10) - 1.74 ^ 1.85 = £(55, 10). 

Proposition 4: Let ||a|| be the rate of steepest ascent in pyramid P = 
{xi, • • • , x„+i } C R n . Let ri, ■ • • , n be the rates of steepest ascent in 
all pyramids that share some face F with P such that a recedes from 
F. Similarly, let s i , •••,*/ be the rates of steepest ascent in all 
pyramids sharing some face with P that is approached by a. Then, for 
monotone /: 

(i) It is consistent with the convexity of /on P that 

ri, ••.,77§||a||Ssi, ■•• ,sj. 

(ii) It is consistent with the concavity of /on P that 

n, >",rj&||a||&«i, ••• , s,. 

Example: It follows from Fig. 9 that h is likely to be convex on ABE, 
since 0.025 < 0.059 < 0.323. 

VIII. MORE ON THE SECOND APPROACH: A GLOBAL METRIC 

Selecting the "best" system of pyramids was straightforward in our 
example with ABE and ADE, since the number of triangles was few 
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Fig. 9 — Convexity of h. 

and there was no trade-off between the two systems. When comparing 
systems of pyramids involving possibly dozens of pyramids each, and 
when each system contains both "good" and "bad" pyramids, individ- 
ual assessments of pyramids must be combined into an overall measure 
to facilitate the comparison. A combined, global measure of the opti- 
mally of a system of pyramids is also necessary for the application of 
our procedure on a computer. We now derive such a metric under the 
principles discussed in Section VI. 

Definition: Let P = {xi, • • • , x„ + i } C R" be a pyramid with associated 
steepest ascent vector a. Let X 1 > • • • > A„ > be the eigenvalues 
derived from P as in Section VI, and let Vi, • • • , v« be the correspond- 
ing eigenvectors. Define 



I M«v,| 



m(P) = 



1=1 



In the above expression, m (P) is intended to measure the departure of 
P from optimality. From the weighted-average form of its definition, 
it follows that a large value of m (P) is caused by large values of | a'v, | 
for the larger values of A,. Equivalently, it implies alignment of the 
major axes of P along a. An optimal P, of course, will do the opposite: 
As seen in Section VI, it will align its least-significant axes along a. 
Definition: Let {P,, • • • , P,,,} be a system of pyramids. Let Vol(Py) 
denote the volume of Pj. [Recall that Vol({x,, •••, x n+ i}) = 
det({xT, • • • , x* + i })/"!; see (Ref. 7, p. 331).] Define 



M({P U ...,P m })= ZVoUP^miPj). 

7=1 
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The expression M({P U • • • , P m }) is the proposed overall metric of the 
departure of the system {Pi , • • • , P m ) from optimality. It sums the 
nonoptimality of each of its constituent pyramids, giving greater weight 
to the more voluminous pyramids. Note that it is not necessary to 
normalize M by the sum 

m 

£ VoUPy), 

y-i 

since two comparable systems of pyramids will cover the same total 

volume. 

Example: Applying the definitions above, 

m(ADE) = (182.0*0.83 + 59.3*0.56) /(182.0 + 0.56) = 0.765, 
60 60 40" 



Vo\(ADE) = det 



7 25 7 
1 1 1 



= 180. 



m (BDE ) = 0.695, Vo\(BDE ) = 230 . 

M({ADE, BDE}) = 297.42; 
m(ABD) = 0.269, Vol(ABD) = 180. 
m(ABE) = 0.492, Vol(ABE) = 230. 

M({ABD,ABE}) = 161.85. 

Since M ({ABD, ABE}) < M({ADE, BDE}), our calculation with the 
metric M agrees with our earlier conclusion that {ABD, ABE} is the 
preferred triangulation over {ADE, BDE } . 

IX. BUILDING PYRAMIDS 

One practical issue remains. In actual empirical studies, vertices 
become available one by one, and except when the first pyramid is 
formed, the pyramid building process is always applied to an existing 
system of pyramids. Especially in higher dimensions, we need an 
efficient and consistent method to incorporate new vertices into old 
systems of pyramids. 

Example: Suppose the vertices A, B, C, D in our computer performance 
example have been satisfactorily triangulated, and E appears as a new 
vertex, as shown in Fig. 10. What new triangulations are generated so 
that they may be compared? If the new vertex is inside some triangle 
[e.g., H = (55, 10) inside ABE], then an obvious reasonable answer 
exists (e.g., replace ABE by HBE, AHE, and ABH). In higher dimen- 
sions, the procedure is just as simple, so hereafter we will only consider 
new vertices not in the convex hull of the old vertices. We should 
specify that we start out with a minimal cover, that is, we start out 
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Fig. 10 — Adding a new vertex to a system of pyramids. 



from a system of pyramids containing the convex hull of the old points 
("cover") in such a way that any point is in the interior of at most one 
pyramid ("minimal"). For example, if triangle BCD were added to the 
triangulation above, we would no longer have a minimal cover, since 
some points close to B would be inside two triangles. We would also 
like to end up with a minimal cover after incorporating the new point 
E. By going from minimal cover to minimal cover, the method we 
develop could be used again when a new vertex (e.g., F) comes around. 
For the sake of consistency, we do not want to add any pyramid 
implicitly rejected by the minimal cover we started out with, such as 
BCD. We call new covers satisfying this desideratum consistent. The 
appropriate building block for the new system of pyramids turns out 
to be faces. 

Definition: Let {xi, • • • , x#} C R" be the set of vertices from the 
starting minimal cover, and let p be a point outside its convex hull. A 
new face is the convex hull of {p, x,,, • • • ,x; n _,} for any sequence 1 ^ 
i a < • • • < i„_i = K such that: 

(i) row rank of (x,, - p, • • • , x n _, — p) = n — 1, and 
(ii) no Xj (j i* ix, • • • , i n -\), is in the convex hull of {p, x,,, • • • , 

*„_,}. 
Condition (i) ensures that the new face does not collapse to some 

<(n — 1) dimensional surface. It may be checked by counting the 
nonzero eigenvalues of the covariance matrix (see Section VI). Con- 
dition (ii) rules out pathologies such as face DE' in the two-dimen- 
sional case pictured in Fig. 11. (AE', of course, is legitimate.) 
Proposition 5: Let F be the set of old faces in the starting minimal 
cover, and N be the set of new faces. If S is any consistent minimal 
cover for the new convex hull whose set of faces is denoted by F, then 
(i) F is a subset of F U N, 
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OLD VERTICES 




*f NEW VERTEX 

Fig. 11— The pathological triangle ADE'. 



(ii) Any two faces from F do not intersect at interior points (though 
they may met at boundaries), and 

(Hi) F is a maximal set satisfying (i) and (ii), i.e., any face not in F 
either is not in Fq U N or meets some face from F at interior points. 
Proof: Check that the faces from any minimal cover must satisfy (ii). 

Thus, assuming that we can enumerate all sets of faces satisfying (i) 
through (Hi), we may proceed to reconstruct minimal covers from 
these sets, and Proposition 5 tells us that we would have captured all 
the consistent new minimal covers we want this way. Incidentally, 
Proposition 5 does not rule out the possibility that we may catch more 
than the new minimal covers or that some sets of faces may not be 
reconstructible into new minimal covers. These situations are dealt 
with by the converse to Proposition 5, which remains to be proven. 
Therefore, we should guard ourselves against the first possibility and 
not be surprised at the second. 

The construction of all sets satisfying (i) to (Hi) can be done neatly 
along a downward-growing binary tree whose branches enumerate the 
sets. 

Example: Continuing the preceding example, it is clear that Fo = {AB, 
AC, AD, BC, BD] and N = {EA, EB, EC, ED). Next, we list pairs of 
faces from F U N that meet at interior points in Table III. We now 
grow our tree. The trunk of our tree contains those faces meeting no 
others. Beyond that, we split the tree into two branches each time we 
choose between a face and any one of the faces it meets. We extend 
each branch as far as possible as long as it does not contain pairs of 
intersecting faces. At the end of each branch we reconstruct a new 
minimal cover from the faces listed on the branch; the result is 
displayed in Fig. 12. Thus, the leftmost branch {AD, BC, BD, EA, EC, 
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Table III — Pairs of 

intersecting faces for 

typical problem 



Face 


Meets Face(s) 


AB 


ED 


AC 


EB.ED 


AD 


— 


BC 


— 


BD 


— 


EA 


— 


EB 


AC 


EC 


— 


ED 


AB,AC 




< ABD, ACE. ABC > 

Fig. 12 — Tree from new vertex E. 

AB, AC) lists one of the possible sets of faces that contains as many 
faces as possible without including two that intersect at interior points. 
It is easy to see that the branch is an enumeration of all faces from the 
triangulation {ABD, ACE, ABC}. The other two branches are ob- 
tained similarly. Using the evaluation techniques from Sections V, VI, 
and VIII, the triangulation reconstructed from the faces listed in the 
middle branch was chosen. 

The only step in this procedure, which is nontrivial in higher 
dimensions, is the tabulation of pairs of faces that intersect at interior 
points. Especially where there are many pairs of faces, making a pair- 
by-pair determination can be cumbersome. 

Notation: Let e, G R" be the vector with in every component except 
the ith, where it is 1. 

Proposition 6: Let F = {xi, ■ • • , x„}, G = {y u • • • , y„) C R" be faces 
from F U N. Choose any j such that (xi - x„, • • • , x„-i — x„, e,) is 
invertible, and let 

p'(F) = eUxi - x„, • • • , x n -j - x„, e,)- 1 E R n . 

(If F is a face, such j exists.) Similarly, define p'(G). Then F and G 
have empty interior intersections if and only if either 
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(j) p'(F)xi < min{p'(F)yi, •••, p'(F)y n ) or p'(F) Xl > 
max{p'(F)yi, • • ■ , p'(F)y„}, or 

(«) p'(G)y, < min{p'(G)xi, •■•, p'(G)x„} or p'(G) yi > 
max{p'(G)x 1 , ...,p'(G)x„}. 

Proof: The clue is to recognize that p(F) is simply some vector 
perpendicular to F and p'(F)y is the unsealed projection of y onto 
p(F). Clearly, two faces have disjoint interiors if and only if the two 
faces have disjoint projections along some direction perpendicular to 
one of them. 

Example: We leave the response time function h so that we may 
illustrate application of Proposition 6 in three dimensions. We are 
given six vertices in R 3 , and we wish to determine how the 20 resulting 
faces intersect in pairs. 



B = 



D=\ 5 { E = \ 6 1 F - ( 6 J . 

Table IV contains the results of the calculations according to Propo- 
sition 6. The last column of the table, where filled, lists the pairs of 
faces with disjoint interiors that have been identified. There are five 
such pairs. For example, the first row shows, using a projection 
perpendicular to ABC, that ABC and DEF do not meet, since 30 < 
min{80, 60, 69}. Note that A, B, and C have the same projection, 30, 
since the projection is perpendicular to ABC. Note also that BCD and 
BCE yield the same rows on Table IV. This is because B, C, D, and E 
He on the same plane. By Proposition 6, the five other pairs of faces 
must meet at interior points. 

Because only six points were involved, each row in the table could 
only be used to separate two three-pointed faces. If more points were 
involved, more than one pair of faces could be found disjoint at once. 
For example, if we had two additional points G and H, and the first 
row read as in Table V, then the following pairs of faces may imme- 
diately be identified as disjoint: 

ABC, DEF ABH, DEF AHC, DEF 

ABC, EFG ABH, EFG AHC, EFG 

ABC, DFG ABH, DFG AHC, DFG 

ABC, DEG ABH, DEG AHC, DEG, • • • 

This method clearly beats having to test each pair separately. 
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Table IV — Determining how 20 faces meet in pairs 



FaceF 


p'(F)A 


P'(F)B 


p'(F)C 


P'(F)D 


P'(F)E 


p'(F)F 




ABC 


30.0 


30.0 


30.0 


80.0 


60.0 


69.0 


ABC, DEF 


ABD 


17.5 


17.5 


5.0 


17.5 


-15.0 


-6.0 


ABD, CEF 


ABE 


24.0 


24.0 


18.0 


50.0 


24.0 


33.0 




ABF 


22.2 


22.2 


14.4 


41.0 


13.2 


22.2 




ACD 


30.0 


5.0 


30.0 


30.0 


85.0 


6.9 




ACE 


30.0 


60.0 


30.0 


140.0 


30.0 


69.0 


ACE, BDF 


ACF 


13.3 


0.0 


6.7 


0.0 


0.0 


6.0 


ACF, BDE 


ADE 


22.1 


12.9 


14.3 


22.1 


22.1 


21.9 


ADE, BCF 


ADF 


22.2 


12.8 


14.4 


22.2 


22.6 


22.2 




AEF 


22.2 


13.2 


14.4 


23.0 


22.2 


22.2 




BCD 


20.0 


10.0 


10.0 


10.0 


10.0 


9.0 




BCE 


20.0 


10.0 


10.0 


10.0 


10.0 


9.0 




BCF 


20.3 


10.5 


10.5 


11.8 


11.3 


10.5 


ADE, BCF 


BDE 


20.0 


10.0 


10.0 


10.0 


10.0 


9.0 




BDF 


20.1 


9.7 


10.2 


9.7 


11.1 


9.7 


ACE, BDF 


BEF 


20.4 


11.4 


10.8 


14.0 


11.4 


11.4 




CDE 


20.0 


10.0 


10.0 


10.0 


10.0 


9.0 




CDF 


20.3 


9.9 


10.5 


10.5 


11.9 


10.5 




CEF 


20.3 


11.3 


10.5 


13.3 


10.5 


10.5 


ABD, CEF 


DEF 


23.0 


14.0 


16.0 


27.0 


27.0 


27.0 


ABC, DEF 


Table V- 


-Separating many pairs 


of faces 


simultaneously 


FaceF 


P'(F)A 


P'(F)B 


p'(F)C 


p'(F)D 


p'(F)E p'(,F)F p 


'(F)G p'(F)H 


ABC 


30 


30 


30 


80 


60 


69 


70 25 



X. RECAPITULATION 

The cycle is now complete. We start out with some minimal cover — 
possibly consisting of a single pyramid— on the x-space of some func- 
tion ;y = /"(x). The behavior of the function on the minimal cover may 
be explored by linear interpolation as discussed in Section II. If further 
information is subsequently found wanting, a new interpolant is cho- 
sen, and a new data point is empirically acquired. Using the technique 
from the preceding section, we are able to list all consistent ways to 
extend the starting rninimal cover into a new minimal cover that 
incorporates the new interpolant. These new minimal covers should 
be evaluated both in terms of their faces and in terms of the axes of 
their pyramids. For either evaluation, the basic ingredient is the notion 
of steepest ascent. Through this notion the values of the function at 
the interpolants are taken into account. This distinguishes our ap- 
proach from other schemes where only the x-space gets examined. 
Eventually, a best new minimal cover is chosen, and we are ready, 
once more, to interpolate and explore the function. 
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